Fetch Reordering and Partitioning of Execution Resources for Atomic Instruction Blocks with Control Flow Assertions by Yen - Ting Tony
نویسندگان
چکیده
The rePLay framework provides a mechanism upon which a variety of code optimizations can be deployed as an application executes. In this thesis, two optimizations are explored. First, the order in which instructions are fetched is optimized. Performance with an optimized schedule is shown to improve by 2.63%. The second optimization is to partition instructions for a clustered microarchitecture. This thesis demonstrates that when such a microarchitecture is implemented along with an optimized fetch schedule, performance still improves by 0.61%.
منابع مشابه
Out-of-Order Instruction Fetch Using Multiple Sequencers
Conventional instruction fetch mechanisms fetch contiguous blocks of instructions in each cycle. They are difficult to scale since taken branches make it hard to increase the size of these blocks beyond eight instructions. Trace caches have been proposed as a solution to this problem, but they use cache space inefficiently. We show that fetching large blocks of contiguous instructions, or wide ...
متن کاملFetch Gating Control Through Speculative Instruction Window Weighting
In a dynamic reordering superscalar processor, the front-end fetches instructions and places them in the issue queue. Instructions are then issued by the back-end execution core. Till recently, the front-end was designed to maximize performance without considering energy consumption. The front-end fetches instructions as fast as it can until it is stalled by a filled issue queue or some other b...
متن کاملIncreasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures - Microarchitecture, 1996., IEEE/ACM International Symposium on
To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers offunctional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rate. We define an optimization, calle...
متن کاملFast approximately timed simulation
In this paper we present a technique for fast approximately timed simulation of software within a virtual prototyping framework. Our method performs a static analysis of the program control flow graph to construct annotations of the simulated program, combined with dynamic performance information. The static analysis estimates execution time based on a target architecture model. The delays intr...
متن کاملVery-Wide-Issue Superscalar Microengine Configurations
To continue microprocessor performance improvements made in the last 2 decades, instruction-level parallelism must be exploited across multiple basic block boundaries. This necessity has led to execution engines which dynamically predict a stream of instructions which are executed concurrently. As issue widths increase, former assumptions about requirements for execution resources such as inter...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001